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Abstract 

Consider the setting of sparse graphs on N vertices, where the ver- 
tices have distinct "names", which are strings of length 0(log N) from 
a fixed finite alphabet. For many natural probability models, the en- 
tropy grows as cN log N for some model-dependent rate constant c. 
The mathematical content of this paper is the (often easy) calculation 
of c for a variety of models, in particular for various standard ran- 
dom graph models adapted to this setting. Our broader purpose is to 
publicize this particular setting as a natural setting for future theoret- 
ical study of data compression for graphs, and (more speculatively) for 
discussion of unorganized versus organized complexity 
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1 Introduction 



The concept entropy arises across a broad range of topics within the math- 
ematical sciences, with different nuances and applications. There is a sub- 
stantial literature (see Section 2.2) on topics linking entropy and graphs, 
but our focus seems different from these. In this paper we use the word only 
with its most elementary meaning: for any probability distribution p = (p s ) 
on any finite set S, its entropy is the number 



For an S- valued random variable X we abuse notation by writing ent(X) 
for the entropy of the distribution of X. 

Consider an N- vertex undirected graph. Instead of the usual conventions 
about vertex-labels (unlabelled; labeled by a finite set independent of N; 
labeled by integers 1,...,N) our convention is that there is a fixed (i.e. 
independent of N) alphabet A of size 2 < A < oo and that each vertex has 
a different "name", which is a length-0(log N) string a = (cti, . . . ,a m ) of 
letters from A. 

We will consider probability distributions over such graphs-with- vertex- 
names, in the N — > oo "sparse graph limit" where the number of edges 
is O(N). In other words we study random graphs-with-vertex-names Qm 
whose average degree is 0(1). In this particular context (see Section 2.1 for 
discussion) one expects that the entropy should grow as 



where c is thereby interpretable as an "entropy rate". Note the intriguing 
curiosity that the numerical value of the entropy rate c does not depend on 
the base of the logarithms, because there is a "log" on both sides of the 
definition (2), and indeed we will mostly avoid specifying the base. 

In Section 4 we define and analyze a variety of models for which calcu- 
lation of entropy rates is straightforward. In Section 5 we study one more 
complicated model. This is the mathematical content of the paper. Our mo- 
tivation for studying entropy in this specific setting is discussed verbally in 
Section 2, and this discussion is the main conceptual contribution of the pa- 
per. The discussion is independent of the subsequent mathematics but may 
be helpful in formulating interesting probability models for future study. 
Section 3 gives some (elementary) technical background. Section 6 contains 
final remarks and open problems. 




(1) 



s 



ent(G N ) ~ cN log N, 
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2 Remarks on data compression for graphical 
structures 



The well-known textbook [9] provides an account of the classical Shannon 
setting of data compression for sequential data, motivated by English lan- 
guage text modeled as a stationary random sequence. What is the analog 
for graph-structured data? 

This is plainly a vague question. Real-world data rarely consists only of 
the abstract mathematical structure - unlabelled vertices and edges - of a 
graph; typically a considerable amount of context-dependent extra informa- 
tion is also present. Two illustrative examples: 

(i) Phylogenetic trees on species; here part of the data is the names of the 
species and the names of clades; 

(ii) Road networks; here part of the data is the names or numbers of the 
roads and some indication of the locations where roads meet. 

Our setting is designed as one simple abstraction of "extra information" , in 
which the (only) extra information is the "names" attached to vertices. Note 
that in many examples one expects some association between the names and 
the graph structure, in that the names of two vertices which are adjacent 
will on average be "more similar" in some sense than the names of two 
non-adjacent vertices. This is very clear in the phylogenetic tree example, 
because of the genus-species naming convention. So when we study toy 
probability models later, we want models featuring such association. 

Let us remind the reader of two fundamental facts from information 
theory [9]. 

(a) In the general setting (1), there there exists a coding (e.g. Huffman code) 
/ p : S — > B such that, for X with distribution p, 

ent(p) < E len(/ p pO) < ent(p) + 1 

and no coding can improve on the lower bound. Here B denotes the set of 
finite binary strings b = b±b2 ■ ■ - b m and len(b) = m denotes the length of 
a string and entropy is computed to base 2. Recall that a coding is just a 
1 — 1 function. 

(b) In the classical Shannon setting, one considers a stationary ergodic se- 
quence X = (Xi) with values in a finite alphabet. Such a sequence has an 
entropy rate 

H := lim fc _1 ent(Xi, . . . , X k ). 
Moreover there exist coding functions / (e.g. Lempel-Ziv) which are univer- 
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sal in the sense that for every such stationary ergodic sequence, 
lim m _1 E len(/(Xi, . . . ,X m )) = H. 

m—¥oo 

The important distinction is that in (a) the coding function / p depends on 
the distribution of X but in (b) the coding function / is a function on finite 
sequences which does not depend on the distribution of X. 

In our setting of graphs with vertex-names we can in principle apply 
(a), but it will typically be very unrealistic to imagine that observed real- 
world data is a realization from some known probability distribution on 
such graphs. At the other extreme, for many reasons one cannot expect 
there to exist, in our setting, "universal" algorithms analogous to (b). For 
instance, the vertex-names (a, a*) across some edges might be related by a 
deterministic cryptographic function. Also note it is difficult to imagine a 
definition analogous to "stationary" in our setting. So it seems necessary 
to rely on heuristic algorithms for compression, where heuristic means only 
that there is no good theoretical guarantee on compressed length. One 
could of course compare different heuristic algorithms at an empirical level 
by testing them on real- world data. As a theoretical complement, one could 
test an algorithm's efficiency by trying to prove that, for some wide range 
of qualitatively different probability models for Qn, the algorithm behaves 
optimally in the sense of compressing to mean length (c+o(l))N log N where 
c is the entropy rate (2). And the contribution of this paper is to provide a 
collection of probability models for which we know the numerical value of c. 

2.1 Remarks on the technical setup 

The discussion above did not involve two extra assumptions made in Section 
1, that the graphs are sparse and that the length of names is 0(log N) (note 
the length must be at least order logiV to allow the names to be distinct). 
These extra assumptions create a more focussed setting for data compression 
that is mathematically interesting for two reasons. If the entropies of the 
two structural components - the unlabelled graph, and the set of names - 
were of different orders, then only the larger one would be important; but 
these extra assumptions make both entropies be of the same order, iVlog N. 
So both of these two structural components and their association become 
relevant for compression. A second, more technical, reason is that natural 
models of sparse random graphs Q n invariably have a well-defined limit 
in the sense of local weak convergence [4, 3] of unlabelled graphs, and the 
limit Qoc automatically has a property unimodularity directly analogous to 
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stationarity for random sequences. This addresses part of the "difficult 
to imagine a definition analogous to stationary in our setting" issue raised 
above, but it remains difficult to extend this notion to encompass the vertex- 
names. 

2.2 Related work 

We have given a verbal argument that the Section 1 setting of sparse graphs 
with vertex-names is a worthwhile setting for future theoretical study of 
data compression in graphical structures. It is perhaps surprising that this 
precise setting has apparently not been considered previously. The large 
literature on what is called "graph entropy" , recently surveyed in [10] , deals 
with statistics of a single unlabelled graph, which is quite different from our 
setting. Data compression for graphs with a fixed alphabet is considered 
in [12]. In a different direction, the case of sequences of length N with 
increasing-sized alphabets is considered in [13, 15]. Closest to our topic is 
[7], discussing entropy and explicit compression algorithms for Erdds-Renyi 
random graphs. But all of this literature deals with settings that seem 
"more mathematical" than ours, in the sense of being less closely related to 
compression of real-world graphical structures involving extra information. 

On the applied side, there is considerable discussion of heuristic com- 
pression algorithms designed to exploit expected features of graphs arising 
in particular contexts, for instance WWW links [5] and social networks [6]. 
What we proposed in the previous section as future research is to try to 
bridge the gap between that work and mathematical theory by seeking to 
devise and study general purpose heuristic algorithms. 

On a more speculative note, we have a lot of sympathy with the view 
expressed by John Doyle and co-authors [1] , who argue that the "organized 
complexity" one sees in real world evolved biological and technological net- 
works is essentially different from the "disorganized complexity" produced 
by probability models of random graphs. At first sight it is unclear how one 
might try to demonstrate this distinction at some statistical level. But pro- 
ducing a heuristic algorithm that codes some class of real-world networks 
to lengths smaller than the entropy of typical probability models of such 
networks would be rather convincing. 



5 



3 A little technical background 

Here are some elementary facts [9] about entropy, in the setting (1) of a 
S- valued r.v. X, which we will use without comment. 

ent(X) < log | S | 
ent(X,Y) < ent(Jf) + ent(Y) 

ent(X) > ent(h(X)) for any h : S S' . 

These inequalities are equalities if and only if, respectively, 

X has uniform distribution on S 

X and Y are independent 

h is 1 — 1 on the range of X. 
Also, if 9 = ^2 S q s 6 s , where q = (q s ) and each 8 S is a probability distribution, 
then 

ent(0) < ent(g) + ^ q s ent(O s ) (3) 

s 

with equality if and only if the supports of the S are essentially disjoint. In 
random variable notation, 

ent(X) = ent(/(X)) + Eent(X|/(X)) (4) 

where the random variable ent(X|Y) denotes entropy of the conditional dis- 
tribution. (Note this is what a probabilist would call "conditional entropy" , 
though information theorists use that phrase to mean Eent(X|Y)). Write 

£(p) = — plogp — (1 — p) log(l — p) 

for the entropy of the Bernoulli (p) distribution. We will often use the fact 

£{p) ~plog| as pi 0. (5) 

We will also often use the following three basic crude estimates. First, 

if K m -> oo and ^ -> then log \ ~ K m log ^. (6) 

Second, for X(n,p) with Binomial(n,p) distribution, if < x n < np and 
x n /n — >■ x € [0,p] then 

logP(X(n,p) < x n ) = -nA p {x n /n) + O(logn), (7) 



where A p (x) := x log | + (1 — x) log jE^- The first order term is standard 
from large deviation theory and the second order estimate follows from finer 
but still easy analysis; see for example Lemma 2.1 of [11]. Third, write 
G[N, M] for the number of graphs on vertex-set 1, . . . ,N with at most M 
edges. It easily follows from (7) that 

if f^Ce [0,=c) then ^M^C. (8) 



4 Easy examples 

Standard models of random graphs on vertices labelled 1,...,N can be 
adapted to our setting of vertex-names in several ways. In particular, one 
could either 

(i) re-write the integer label in binary, that is as a binary string; or 

(ii) replace the labels by distinct random strings as names. 
These two schemes are illustrated in the first two examples below. 

We present the results in a fixed format: a name for the model as a 
subsection heading, a definition of the model Gn, typically involving pa- 
rameters a, j3, . . ., and a Proposition giving a formula for the entropy rate 
c = c(a, /?,...) such that 

ent(^Tv) ~ cN log N as N — > oo. 

Model descriptions and calculations sometimes implicitly assume N is suffi- 
ciently large. 

These particular models are "easy" in the specific sense that indepen- 
dence of edges allows us to write down an exact expression for entropy; then 
calculations establish the asymptotics. We also give two general results, 
Lemmas 1 and 2, showing that graphs with short edges, or with similar 
names between connected vertices, have entropy rate zero. 



4.1 Sparse Erdos-Renyi, default binary names 

Model. N vertices, whose names are the integers 1, ... ,N written as bi- 
nary strings of length [log 2 N] . Each of the ( 2 ) possible edges is present 
independently with probability a/N, where < a < oo. 

Entropy rate formula. c(a) = ^. 

Proof. The entropy equals ( 2 )S(a/N); letting N — > oo and using (5) gives 
the formula. 
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4.2 Sparse Erdos-Renyi, random A-ary names 

Model. As above, N vertices, and each of the f^) possible edges is present 
independently with probability a/N. Take ~ f3log^N for 1 < /3 < oo 
and take the vertex names as a uniform random choice of N distinct ^4-ary 
strings of length L^. 

Entropy rate formula. c(a, /3) = /3 — 1 + ^. 

Proof. The entropy equals log ( A N N ) + (^)£(a/N). The first term ~ (/3 — 
1) N log N by (6) and the second term ~ ^ N log N as in the previous model. 

Remark. One might have naively guessed that the formula would involve 
f3 instead of f3 — 1, on the grounds that the entropy of the sequence of 
names is ~ (3N log N, but this is the rate in a third model where a vertex 
name is a pair (i, a), where 1 < i < N and a is the random string. This 
model distinction becomes more substantial for the model to be studied in 
Section 5. 

4.3 Small Worlds Random Graph 

Model. Start with N = n 2 vertices arranged in an n x n discrete torus, 
where the name of each vertex is its coordinate-pair written as two 

binary strings of lengths [log 2 n\ . Add the usual edges of the degree-4 
nearest neighbor torus graph. Fix parameters < a, 7 < 00. For each edge 
(w,v) of the remaining set S of (*f) - 2N possible ed ges in the graph, add 
the edge independently with probability pn(\ \w — v\ (2), where pn{t) = ar" 1 
and a := a/v,-y is chosen such that the mean degree of the graph Gn of these 
random edges — > a as N — > 00 (see (11,12) for explicit expressions) and the 
Euclidean distance \ \w — v\\2 is taken using the torus convention. 

Entropy rate formula. 

c(a,j) = a/2, 0<7<2 
= a /4, 7 = 2 
= 0, 2 < 7 < 00. 

Remark. The different cases arise because for 7 < 2 the edge-lengths are 
order n whereas for 7 > 2 they are 0{\). 

Proof. Write Ti = \J i 2 + j 2 and pij = PN( r i,j)- The degree D(v ) of vertex 
v in Gn (assuming n is odd - the even case is only a minor modification) 
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has mean 

(n-l)/2 (n-l)/2 

ED(t>)-4 = 4 PJv(r M ) + 4 pjv(r ii0 ) 

i,j=l i=2 

(n-l)/2 (n-l)/2 \ 

4 J2 (* 2 +J 2 n /2 + 4 £ . (9) 

i,j=l t=2 / 

Similarly, the entropy of <5at is exactly 

/ (n-l)/2 (n-l)/2 \ 

ent(^) = - 4 ^ f(p JV (r,, i )) + 4 ]T £ (Ptffao)) • ( 10 ) 

\ i,i=l t=2 / 

One can analyze these expressions separately in the three cases. First con- 
sider the "critical" case 7 = 2. Here the quantity in parentheses in (9) is 
~ J^ n 2irr~ 1 dr ~ 2-7rlogn ~ 7rlog N. We therefore take 

a = a N,i ~ ( n ) 

so that Qn has mean degree — > 4 + a. Evaluating the entropy similarly, 
where in the second line the "log a" term is asymptotically negligible, 

TV /-0 1 - 1 )/ 2 
ent{G N ) ~ — / 2vrr S{ar~ 2 )dr 

/•(n-l)/2 

~ iW / r • ar • (— log a + 2 log r) dr 

-(n-l)/2 
2Nira 1 r logr <ir 



~ 2iV7ra • i log 2 n 
~ fiVlogiV 

giving the asserted entropy rate formula in this case 7 = 2. 

In the case 7 < 2, more elaborate though straightforward calculations 
(see appendix) show that to have the mean degree — > a + 4 we take 

2 — 7 

' 7 7 7 2i+7/;/ 4 sec2-7(e)^ 

and then establish the asserted entropy rate a/2. 



9 



In the case 7 > 2 the mean length of the edges of Qn becomes 0{\). One 
could repeat calculations for this case, but the asserted zero entropy rate 
follows from the more general Lemma 1 later, as explained in Section 4.6. 

Remark. The case 7 < 2 suggests a general principle that models with 
"long edges" should have the same entropy rates as if the edges were uniform 
random subject to the same degree distribution. But there seems no general 
formulation of such a result without explicit dependence assumption. 



4.4 Edge-probabilities depending on Hamming distance 

We first describe a general model, then the specialization that we shall 
analyze. 

General model. Fix an alphabet A of size A. For each N choose 
Ln such that N < A Ln , and suppose Ln ~ for some /3 £ [l,oo). 

Take iV vertex-names as a uniform random choice of distinct length-L^r 
strings from A. Write <iff(a, a') = \{i : en 7^ a^}| for Hamming distance 
between names. For each N let w = be a sequence of decreasing weights 
1 = io(l) > w{2) > . . . > w(Ln) > 0. We want the probability of an edge 
between vertices (a, a') to be proportional to w(oIh(sl, a')). For each vertex 
a, the expectation of the sum of w^dni^-, a')) over other vertices a' equals 

AT_i ^ (L N \ (A- \\ u f\\ L «~ u 



u=l 



Fix < a < 00, and make a random graph Qn with mean degree a by 
specifying that, conditional on the set of vertex-names, each possible edge 
(a, a') is present independently with probability aw(^(a, a'))///jy 

Note that in order for this model to make sense, we need > a, which 
is not guaranteed by the description of the model. 

Intuitively, we expect that the lengths (measured by Hamming distance) 
of edges will be around the £n maximizing (^ N ) w n (£n), and that for all 
(suitably regular) choices of w N with £n/N — > d G [0, 1] the entropy rate 
will involve w N only via the limit d. Stating and proving a general such 
result seems messy, so we will study only the special case 

w(u) = 1, 1 < u < Mat; (14) 
= 0, M N <u< L n 
M N /L N -> de(0,l-i) (15) 
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for Ap(d) as at (7). Here condition (16) is needed, as we will see at (18), to 
make fj,N — > oo. Note that for the case d = one could use Lemma 1 later 
and the accompanying conditioning argument in Section 4.6 to show that 
the entropy rate of Qn equals the rate (/3 — 1) for the set of vertex-names. 
The opposite case (1 — 1/A) < d < 1 is essentially the model of Section 4.2 
and the rate becomes (3 — 1 + a/2: as expected, these rates are the d — > 
and the d — > 1 — 1/A limits of the rates in our formula below. 
Entropy rate formula. In the special case (14 - 16), 

c(A,a,P,d)=P-l+2[l- lQgA )■ 

To establish this formula, first observe that for Binomial X(-, •) as at (7) 

N - 1 
1 _ a~ l » 



m = -i j^j-m < X(L N , 1 - 1/A) < M N ), (17) 



and so by (7) 

log Mat 1 A!_ 1/A (d) 

logiV log A ' 1 ' 

So condition (16) ensures that /in — > oo and therefore the model makes 
sense. Write Names (to avoid overburdening the reader with symbols) for 
the random unordered set of vertex-names, and use (4) to write 

ent(^Ar) = ent(Names) + Eent {Q n | Names). 

As in Section 4.2 the contribution to the entropy rate from the first term is 
(3 — 1. For the second term, write 

ent(^7v|Names) = ^ £ {^^j l(aeNames,a'eNames)l(d H (a,a')<A/ JV ) 
a^a' 

where the sum is over unordered pairs {a, a'} in A Lpf . Take expectation to 
get 

But from the definition (13) of [In this simplifies to ^-^nS (ck/hn), and then 
from (5) 

aN 

Eent(^iv|Names) ~ — — log /i at. 
Appealing to (18) establishes the entropy rate formula. 
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4.5 Non-uniform and uniform random trees 

Model. Construct a random tree Tn on vertices 1, . . . , N as follows. Take 
V3, V4, . . . , Vjv independent uniform on {1, . . . , N}. Link vertex 2 to vertex 
1. For k = 3,4, . . . , N link vertex k to vertex min(fc — 1, 

Entropy rate formula, c = 1/2. 

Proof. 

N 

ent(Tiv) = ^ent(W fc ) 

k=3 

where Wk = mm(k — 1, V&) has entropy 

ent(W fc ) = ^ log JV + log 

The sum of the first term ~ ^iVlog N and the sum of the second term is of 
smaller order. 

Remark. This tree arose in [2], where it was shown (by an indirect ar- 
gument) that if one first constructs Tn, then applies a uniform random 
permutation to the vertex-labels, the resulting random tree T* N is uniform 
on the set of all labelled trees. Cayley's formula tells us there are 
labelled trees, so ent(T^r) = log N N ~ 2 &nd so (T N ) has entropy rate c = 1. 

4.6 Conditions for zero entropy rate 

Here we will give two complementary conditions under which the entropy 
rate is zero. Lemma 1 concerns the case where we start with deterministic 
vertex-names, and add random edges which mostly link a vertex to some 
of the "closest" vertices, specifically to vertices amongst the (o(N £ ) for all 
e > 0) closest vertices. Lemma 2 concerns the case where we start with 
a determinstic graph on unlabelled vertices, and add random vertex-labels 
such that vertices linked by an edge mostly have names that differ in only 
o(logiV) places. Note that these lemmas may then be applied condition- 
ally. That is, if we start with a random unordered set of names, and then 
(conditional on the set of names) add random edges in a way satisfying the 
assumptions of Lemma 1, then the entropy rate of the resulting Qn will equal 
the entropy rate of the original random unordered set of names. Similarly, 
if we start with a random graph on unlabelled vertices, then (conditional 
on the graph) add random names in a way satisfying the assumptions of 
Lemma 2, then the entropy rate of the resulting Qn will equal the entropy 
rate of the original random unlabelled graph. 
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Lemma 1 For each N , suppose we take N vertices with deterministic names 
(w.l.o.g. 1 < i < N written as binary strings, to fit our set-up) and suppose 
for each i we are given an ordering 2), . . . ,j(i, N — 1) of the other 

vertices. Say that an edge (i,j = with i < j has length t. Consider a 

sequence of random graphs Qn whose distribution is arbitrary subject to 

(i) The number of edges satisfies F '(En > Nf3) = o(N~ 1 log N) for some 
constant f3 < oo; 

(ii) For some Mjv such that log(MAr) = o(logiV), the r.v. 

Xn '■= number of edges with length greater than M/v 
satisfies ¥(X N > N5) = o(l) for all 5 > 0. 
Then the entropy rate is c = 0. 

Remark. The lemma applies to the 7 > 2 case of the "small worlds" model 
in Section 4.3. Take the ordering induced by the natural distance between 
vertices. In this case, En is a sum of independent indicators with ~EEn ~ cN 
for some constant c. Standard concentration results (e.g. [8] Theorem 2.15) 
imply (i) for any (3 > c, and (ii) follows since for any sequence Mn — > 00 we 
have EX N = 0{NM l ~ l/2 ) = o{N). 

Proof. We first show that the result holds with (i) replaced by 

00 ¥(E N > N/3) = 0, 

and then use this modified statement to prove the lemma. 

Assume now that Qn satisfies (i') and (ii) and write Qn, considered as 
an edge-set, as a disjoint union Q' N U Q" N , where Q' N consists of the edges 
of length < Mn- Because Q' N contains at most f3N edges out of a set of at 
most NMn edges, 

ent(G' N ) < log(PN) + log 

= o(N log N) by (6). 

Now fix 5 > and condition on whether the number Xn of edges of Q N is 
bigger or smaller than 5N . Using (3) we get 

ent(£^) < log 2 + log G[N, 6N] + F(X N > 5N) log G[N, 0N]. 

Now P(X;v > SN) —> by assumption (ii), and then using (8) we get 

ent(g N ) < (5 + o(l))NlogN. 

Because 5 > is arbitrary we conclude 

ent{G N ) < ent(^v) + ent(^) = o(N log N). 
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Now assume that Qn satisfies the weaker hypotheses (i) and (ii). Defin- 
ing Q 7v to have the conditional distribution of Qn given En < (3N, it is 
clear that Qn satisfies (i'). We will show that it also satisfies (ii), implying 
(by the previous result) that the entropy rate of Qn is zero. Let 5 > 0. 
Conditioning on the event (An, say) that En < /?, 

F(X N > SN) = F(X N > 5N\An)F(A n ) + F(X N > 5N\A C N )F(A C N ). (19) 

By (ii), the term on the left hand side of (19) is o(l), and by (i), F(A C N ) = 
o(l), and so also ¥(A N ) -> 1. Thus, F(X N > SN\A N ) must be o(l), as 
desired. 

To complete the proof, use (3) to write 
ent(gjv) < £(F(A N )) + ent(£ N \A N )F(A N ) + ent(g N \A c N )F(A N ) 
< log 2 + ent(d N ) + F(A C N ) (^j log 2. 

The entropy rate of Qn is zero, and assumption (i) is exactly that F(A C N ) = 
o(iV~ 1 logiV), so ent(£?/v) = o(iVlogiV), as desired. 

Lemma 2 Take a deterministic graph on N unlabelled vertices, and let cn 
denote the number of components and ejv the number of edges. Construct 
Qn by assigning random distinct vertex-names a(v) of length O(logiV) to 
vertices v, their distribution being arbitrary subject to 

ciff(a, a') = o(./VlogiV) in probability. 

edges (a, a') 

If cn = O(N) and cn = o(N) then Qn has entropy rate zero. 

Proof. By a straightforward truncation argument we may assume there is 
a deterministic bound 

J2 dtf(a,a') < s N = o(NlogN). 

edges (a, a') 

The name- lengths are < (3 log N for some /3. Consider first the case where 
there is a single component. Take an arbitrary spanning tree with arbitrary 
root, and write the edges of the tree in breadth-first order as e\, . . . , eN-i- 
We can specify Qn by specifying first the name of the root; then for each 
edge Ci = (v,v ! ) directed away from the root, specify the coordinates where 
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a(v') differs from a(v) and specify the values of a(i/) at those coordinates. 
Write S for the random set of all these differing coordinates. Conditional on 
S = S the entropy of Qn is at most (\S\ + /3 log N) log A, where the /3 log N 
term arises from the root name. So using (3) 

ent(G N ) <ent(5) + (s7v + /31ogiV)logA. 

With cn components the same argument shows 

ent{g N ) < ent(S) + {s N + c N f3 log N) log A. 

The second term is o(iVlogiV) by assumption, and 

ent(5)<logfx: p 1 ;^) j=o(iVlogiV), 

the final relation by e.g. the p = 1/2 case of (7). 
4.7 Summary 

The reader will recognize the models in this section as standard random 
graph models, adapted to our setting in one of several ways. One can take a 
model of dynamic growth, adding one vertex at a time, and then assign the 
fc'th vertex a name, e.g. the "default binary" or "random A-ary" used in the 
Erdos-Renyi models. Alternatively, as in the "Hamming distance" model, 
one can start with N vertices with assigned names and then add edges 
according to some probabilistic rule involving the names of end- vertices. 
Roughly speaking, for any existing random graph model where one can cal- 
culate anything, one can calculate the entropy rate for such adapted models. 
But this is an activity perhaps best left for future Ph.D. theses. We are more 
interested in models where the graph structure and the name structure each 
simultaneously influence the other, rather than starting by specifying one 
structure and having that influence the other. It is not so easy to devise 
tractable such models, but the next section shows our attempt. 

5 A hybrid model 

In this section we study a model for which calculation of the entropy rate is 
less straightforward. It incidently reveals a connection between our setting 
and the more familiar setting of "graph entropy" . 
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5.1 The model 



In outline, the graph structure is again sparse Erdos-Renyi Q(N,a/N), but 
we construct it inductively over vertices, and make the vertex-names copy 
parts of the names of previous vertices that the current vertex is linked to. 
Here are the details. 

Model: Erdos-Renyi with hybrid names. Take Ljy ~ (3log A N for 
1 < (3 < oo. Vertex 1 is given a uniform random length-L^v ^4-ary name. 
For 1 < n < N - 1: 

vertex n + 1 is given an edge to each vertex i < n independently 
with probability a/N. Write Q n > for the number of such 
edges, and a 1 , . . . , aP n for the names of the linked vertices. Take 
an independent uniform random length-L^r ^4-ary string a . As- 
sign to vertex n + 1 the name obtained by, independently for 
each coordinate 1 < u < Ljy, making a uniform random choice 
from the Q n + 1 letters a„, a„, . . . , a„ . 

See Figure 1. This model gives a family {Qn) parametrized by (A, (3, a). 
Note that this scheme for defining "hybrid" names could be used with any 
sequential construction of a random graph, for instance preferential attach- 
ment models. 



Figure 1. Schematic for the hybrid model. A vertex (right) arrives with 
some "original name" bbdabc and is attached to two previous vertices with names 
dafcbb and bfecad. The name given to the new vertex is obtained by copying for 
each position the letter in that position in a uniform random choice from the three 
names. Choosing the underlined letters gives the name shown in the figure. 

5.2 The ordered case 

This model illustrates a distinction mentioned in Section 4.2. In the con- 
struction above, the nth vertex is assigned a name, say a", during the con- 
struction, but in the final graph Qn we do not see the value of n for a vertex. 



bfecad 




bafcac 
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The "ordered" model (G% d ) in which we do see the value of n for each ver- 
tex, by making the name be (n,a n ), is a different model whose analysis is 
conceptually more straightforward, so we will start with that model. We 
return to the unordered model in section 5.6. 
Entropy rate formula for (Q°^ d ). 

2 +p 2^ k]logA 



where 



J k (a) := f x k e~ ax dx 
Jo 



and the constants fiA(k) are defined at (25). 

Write Gn,u for the partial graph obtained after vertex n has been as- 
signed its edges to previous vertices and then its name. We will show that, 
for deterministic ejv,n defined at (27) below, as N — > oo the entropies of the 
conditional distributions satisfy 

max E |ent(^Ar. n+ i|^Ar jn ) - e N , n \ = o(log N). (21) 

l<n<N— 1 

By the chain rule (4) this immediately implies 

JV-l 

ent(S^ d ) - e N,n = o(NlogN) 

n=l 

which will establish the entropy rate formula. 

The key ingredient is the following technical lemma; note that the mea- 
sures p, 1 below depend on the realization of Q°^ d and are therefore random 
quantities. Write "ave" for average, and write 

||9|p := \ |©(a)-^~ fc 

aSA*: 

for the variation distance between a probability distribution on A fc and 
the uniform distribution. 

Lemma 3 Write (n,a n ), 1 < n < N for the vertex-names of Q°^ d . For 
each k > 1 and i := . . . , i^) with 1 < i± < • • • < i k < N, write [i % for the 
empirical distribution of (a^ 1 , . ..,aj), 1 < u < Ljv- That is, the probability 
distribution on A k 



H l (x 1} . . . ,x k ) := L N 1 J2 1 f n n 



u=l 



--xi,...,a^=x k 
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Then 



n\ ( A k l 2 h 2 \ 

A%> := max Ell ave p'llW < C\ - + -- . (22) 

^ 2<«<v M i<n<...<i fc <n p 11 - N J y ' 

for a constant C not depending on k,N. 

We defer the proof to Section 5.3. 

Fix N and n, and consider ent(G]y )n+ i\G]y tn ), the entropy of the condi- 
tional distribution. Conditioning on the edges of vertex n + 1 in Qn,n+ii 
and using the chain rule (4), we find 

ent(</Ar jn+ i|C?Ar in ) = n£(a/N) 

+ E \n) ( 1_ iv) ™M* n+ \GN,n,n + l^{i u ...,i k }), (23) 

fe=0,...,n 
l<ii<-<i(;<n 

where n + 1 — > . . . , i/J denotes the event that vertex n + 1 connects to 
vertices ii,... ,ik and no others. The contribution to the entropy from the 
choice of edges is n£(a/N), which as in previous models contributes (after 
summing over n) the first term a/2 of the entropy rate formula, so in the 
following we need consider only the contribution from names, that is the 
sum in (23). Consider the contribution to the sum (23) from k = 2, that 
is on the event {Q n = 2} that vertex n + 1 links to exactly two previous 
vertices. Conditional on these being a particular pair 1 < i < j < n, with 
names a* , a J , the contribution to entropy is exactly 

ent(a n+1 |^ in ,n + 1 -> {i,j}) = L N g 2 (a,a') ^(a,a>) 

(a,a')gAx A 

where 

g 2 (a,a>) = *a(4£, 5* 5* M) if «'^ a 

— F ( 2A+1 J_ J_ J_ J_\ \t i — n 

~ CA \ 3A ' 3A > 3A ' 3A ' ZA> 11 U ~ U 

and where £a(p) is the entropy of a distribution p = (pi, . . . ,pa)- Now 
unconditioning on the pair the contribution to ^t{Q n ,n+i\G N ,n) from 

the event {Q n = 2}; that is the k = 2 term of the sum (23); equals 

L " E ^(i-F)"" 2 E 92(a,a') ^\a,a') 

!<*<i< n (a,a')eAxA 
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= (1 - ^ E 92M ave A* (y W)- (24) 

iv 4 — * i<«<j<™ 

(a,o')eAx A 

Lemma 3 now tells us that the sum in (24) differs from 
h A (2):=A- 2 Yl 92M 

(o,o')6AxA 

(2) 

by at most 2g^A N where g\ < log A is the maximum possible value of 

') an d A]y is as defined in Lemma 3. So to first order as N — > 00, the 
quantity (24) is 



e N ,n,2 ■= /31og A N x a 2 h A (2)^i exp(-an/N), 



with an error bounded by 



iV • 



A similar argument applies to the terms in the sum (23) for a general number 
k of links. In brief, we define 

eN,n,k ■= filog A N x a k h A (k)fl exp(-an/N) 

where 

/u(fc):=A" fc E ent(p[ ai '-' afc ]), (25) 

(ai,...,a fe )GA fc 

and where pl ai '---' a fc] is the probability distribution p on A defined by 

p[ai,..,a k ]( a ) = i+4x|{*:ai = a}| 



(1 + fc)A 

Also for /c = we set h A (0) = log A, the entropy of the uniform distribution 
on A. Repeating the argument from the case k = 2, we find that (23) is, to 
first order, Ylk>o e N,n,k, w ith error of order 



k=0 



Applying Lemma 3 to bound and then using simple properties of the 
binomial distribution yields that (26) is o(logiV). 
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So we are now in the setting of (21) with 

k>0 



(27) 



Because 




n=l 



calculating Yln=i e N,n gives the stated entropy rate formula. 
5.3 Proof of Lemma 3 

Fix N. Recall the construction of Q°^ d involves an "original name process" 
- letters of the name of vertex n may be copies from previous names or may 
be from an "original name", independent uniform for different n. Consider 
a single coordinate, w.l.o.g. coordinate 1, of the vertex-names oiG% d . For 
each vertex n this is either from the original name of n or a copy of some 
previous vertex-name, so inductively the letter at vertex n is a copy of the 
letter originating at some vertex 1 < (n) < n; and similarly the letter at 
general coordinate u is a copy from some vertex (n) . Because the copying 
process is independent of the name origination process, it is clear that the 
(unconditional) distribution of each name a n is uniform on length-Ljy words. 
Moreover it is clear that, for 1 < i < j < N, 



The proof of Lemma 3 rests upon the following lemma, whose proof we defer 
to the end of Section 5.4 . 

Lemma 4 For (I, J) uniform on {1 < i < j < n}, write 9n,ti = P(Cf^(7) = 



We first use this lemma to prove Lemma 3 in the case where k = 2. For 
(/, J) as in Lemma 4, 



the two names a 1 and a 3 are independent uniform 
on the event {C*(i) + C^(j) Vn}. 



(28) 



(J)). Then 



max On 

2<n<N 



0(1 /N) as N -> oo. 




± max E V EfL^yi/j 

2 2<n<N ^— ' V " ^— ' [ u 



)\QN,n) - A 2 



< 





al=x 1 ,a£=x 2 ) 



A- 2 . 



(29) 



xeA 2 w=i 
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By Lemma 4 and (28), the two names a I ,a J are independent uniform on 
A Ln outside an event of probability 0(1/ N). Under this event, we bound 

(2) 

the total variation distance appearing in A N by 1, leading to the second 
summand in the bound (22). If the two names are independent, then because 
the sum below has Binomial(LAr, A~ 2 ) distribution with variance < L^A" 2 , 



E 



■xi, 



A~ 



u=l 



< L 



-1/2 



N 



(30) 



which contributes the first summand in the bound (22). 

The proof of Lemma 3 for general k is similar. Taking I±, . . . ,1^ inde- 
pendent and uniform on the set {l<ii<---<ifc<re},we have the analog 
of (29): 



A 



N 



< tj max 

z 2<n<N 



xeA* 



L 



N 



'E 1 



■XI, 



A' 



The names a^ 1 , . . . , a^ fe are independent outside of the "bad" event that 
some pair within k random vertices have the same C^(-) value. But the 
probability of this bad event is bounded by (^) times the chance for a given 
pair, which, after applying Lemma 4, leads to the second summand of the 
bound (22). And the upper bound for the term analogous to (30) becomes 



L 



N 



l/2 A -k/2 



5.4 Structure of the directed sparse Erdos-Renyi graph 

In order to prove Lemma 4 and later results, we study the original name vari- 
ables Cu(i) defined at (28). It will first help to collect some facts about the 
structure of a directed sparse Erdos-Renyi random graph. Write (omitting 
the dependence on N) 

T n = {l<i<^: 3g>0 and a path n = vq > V\ > . . . > v g = j in <5iv}- 

We visualize T n as the vertices of the tree of descendants of n although it 
may not be a tree. The next result collects two facts about the structure of 
Tn including that for large N it is a tree with high probability. 

Lemma 5 For m < n and T n o,s above, 

(a) nTn^T m ^%)< {aea? N +ae \ 

(b) P(T n is not a tree ) < ■ 
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Proof. First note that for 1 < j < n < N the mean number of decreasing 
paths from n to j of length g > 1 equals ( n ~^~[ 1 ) (a/N) 9 . Because n—j — 1 < 

N, this is bounded by > an d summing over g gives 

P(j E 7~n) < E(number of decreasing paths from n to j) < ae a /N. (31) 

We break the event T n n T" m 7^ into a disjoint union according to the 
largest element in the intersection: max T n H 7" m = j for j = 1, . . . , m. Now 
note for j < m — 1, we can write 

P(maxT„ nT m = j)<E^ 1 (4 ig path in g^l^ is path in g N y 

3 3 
n 1 Vm 

(32) 

where the sum is over edge-disjoint decreasing paths x J n from n to j and y m 
from m to j. Since the paths are edge-disjoint, the indicators appearing in 
the sum (32) are independent and so we find 

P(maxT n fl T m = j) < ^ P(^{ is path in Qjy)F(y m is path in Qn) 

3 3 

— ^( x n * s P atn ™ ^at) lP(2/m is path in Q n) 

3 3 

< (ae a /N) 2 ; 

where the sums in the second line are over all paths from n (respectively m) 
to j, and the final inequality follows from (31). Now part (a) of the lemma 
follows by summing over j < m and adding the corresponding bound (31) 
for the case j = m. 

Part (b) is proved in a similar fashion. If T n is not a tree then for some 
32 G T n and some j\ < j'2 there are two edge-disjoint paths from j'2 to j\. 
For a given pair (j'2 , ji ) the mean number of such path-pairs is bounded by 
P(j2 G T n ) x {oie a /AO 2 . By (31) this is bounded by (ae a /Nf, and summing 
over pairs (j'2, ji) gives the stated bound. ■ 

Remark. Note that for part (a) of the lemma we could also appeal to 
the more sophisticated inequalities of [14] concerning disjoint occurrence of 
events, which would give the stronger bound P(maxT" n fl T m = j) < P(j £ 

T m ) X P(j S T n ). 

Proof of Lemma 4. Lemma 4 follows from Lemma 5(a) and the obser- 
vation that {Cf (i) = Cf (i)} C {Ti n Tj + 0}. 
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5.5 Making the vertex labels distinct 



In the ordered model studied above, the vertex-names are (n,a n ), 1 < n < 
N. In order to study the unordered model described at the start of Section 5, 
we first must address the fact that the vertex-names a n , 1 < n < N may not 
be distinct. 

Lemma 6 Let Qn be random graphs-with-vertex-names, where (following 
our standing assumptions) the names have length log N/ log A < Ln = 
O(logiV), and suppose that for some deterministic sequence kjy = o(N), 
the number of vertices that have non-unique names in Qn, say Vn, satisfies 
E(V/v > k]\f) = o(l). Let Q* N be a modification with unique names ob- 
tained by re-naming some or all of the non-uniquely-named vertices. Then 



\ent{G* N ) -ent(G N )\= o(N log N). 

Proof. The chain rule (4) implies that 

ent(Q* N ) < ent(^Ar) + Eent(^^r|^Ar), 

so we want to show that Eent(<5^|{?Ar) is o(NlogN). Considering the num- 
ber of ways of relabeling Vn vertices, 



as desired. ■ 

Remark. The analogous lemma holds if instead we replace the labels of 
any random subset of vertices of Qn to form G* N , provided the subset size 
satisfies the same assumptions as Vn- 

Lemma 7 For Q°^ d and Qn, 



Proof. As in Lemma 4, the proof is based on studying the originating ver- 
tex Ci{n) (now dropping the notational dependence on N) of the letter 
ultimately copied to coordinate i of vertex n through the "trees" T n . Given 
T n , the copying mechanism that determines the name a n evolves indepen- 
dently for each coordinate, and this implies the conditional independence 




< n^gA v " L ")l iVN<kN) +E(logA L " v »)l { v N >k N ), 



< log(A)L N [k N + N¥(V N > k N )} =o{N log N), 



E|{n : a n = a m for some m ^ n}\ 



o(N). 



(33) 
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property: for 1 < m < n < N, the events {Cj(n) = d(m)}, 1 < % < Ln are 
conditionally independent given T m and T n - Because 

= P(C,H = C l (iJi)\T n ,T m ) + \nCi{n) + Ci{m)\T n ,T m ) 
the conditional independence property implies 
P(a n = a m |T„,T m ) 

t (34) 

= [P(d(n) = d(m)|r„,r m ) + ^P(C7i(ti) / Ci(m)|r„,r ro )] Lw • 

Now we always have C\{n) G T n . so trivially 

P(Ci(n) = Ci(m)|T„,T m ) = on T n C\T m = 0- (35) 

We show below that when the sets do intersect we have 

P(Ci(ra) = Ci(m)\T n ,Tm) < \ on {T n and T m are trees}. (36) 

Assuming (36), since A > 2, for p < 1/2, we have p + (1 — p)/A < 3/4, and 
now combining (34, 35, 36), we find 

P(a n = 3L m \T n ,T m ) < (j) LN l(T n nT m ^) on i r « and r ™ are trees i- 

Now take expectation, appeal to part (a) of Lemma 5, and sum over m to 
conclude 

P(7~n is a tree, a n = a m for some m ^ n for which T m is a tree) 

< (f) LiV ((ae Q ) 2 + ae a ) -> 0. 

Now any n for which the name a n is not unique is either in the set of n 
defined by the event above, or in one of the two following sets: 

{n : T n is not a tree} 

{n : 7"n is a tree, a n = a m for some m ^ n for which T m is not a tree 
but a n 7^ a m for all m ^ n for which T m is a tree}. 

The cardinality of the final set is at most the cardinality of the previous 
set, which by part (b) of Lemma 5 has expectation 0(1). Combining these 
bounds gives (33). 

It remains only to prove (36). For v £ T n write R v {n) for the event that 
the path of copying of coordinate 1 from C\{n) to n passes through v. We 
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may assume there is at least one edge from n into [l,n — 1] (otherwise we 
are in the setting of (35)). Given T n , the chance that vertex n adopts the 
label of any given neighbor in T n is bounded by 1/2, we see 

F(R v (n)\T n ) < i veTn- (37) 

Similarly by (35) we may assume T n H T m ^ 0. By hypothesis T n and T m 
are trees, and so there is a subset M. C T n H T m of "first meeting" points v 
with the property that the path from v to n in 7" n does not meet the path 
from v to m in Tm and 

{din) = Ci(m)} = U u&yw [^(n) n i^(m)] 

with a disjoint union on the right. So 

P(d(n) = CiHITn.Tm) = P( J R 1) (n)|T n ) X P(/J„(m)|T ro ). (38) 

Now f — > ¥(R v (n)\T n ) and t> — > F(R v (m)\T m ) are sub-probability distribu- 
tions on M. and the former satisfies (37). Now (38) implies (36). ■ 

5.6 The unordered model and its entropy rate 

The model we introduced as Qn in section 5.1 does not quite fit our default 
setting because the vertex-names will typically not be all distinct. However, 
if we take the ordered model Q°^ d and then arbitrarily rename the non- 
unique names, to obtain a model Q°^ d * say, then Lemmas 6 and 7 imply 
that only a proportion o(l) of vertices are renamed and the entropy rate is 
unchanged: 

(G% d *) has entropy rate (20) . 

Now we can "ignore the order", that is replace the names {(n, a n )} by the 
now-distinct names {a n }, to obtain a model Q* N , say. In this section we will 
obtain the entropy rate formula for {Q%) as 

(entropy rate for Q* N ) = (entropy rate for Q°^ d *) — 1. (39) 

The remainder of this section is devoted to the proof of (39). Write 
T-Lm for the Erdos-Renyi graph arising in the construction of Q°^ d *; that is, 
each vertex n + 1 is linked to each earlier vertex i with probability at/N, 
and we regard the created edges as directed edges (n + 1, i). Now delete 
the vertex-labels; consider the resulting graph as a random unlabelled 
directed acyclic graph. Given a realization of tC^ 1 there is some number 
1 < M{W^ 1 ) < Nl of possible vertex orderings consistent with the edge- 
directions of the realization. 
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Lemma 8 In the notation above, 



ent(GT*) = ent(G* N ) +Elog M(nT)- (40) 
Proof. According to the chain rule (4), 

ent(S^*) = ent(^) + Eent(^ d *|^). 
We only need to show 

ent(^*|^) = logM(^), 

which follows from two facts: given Q* N , all possible vertex orderings con- 
sistent with the edge-directions of "H 1 ^ 1 are equally likely and there are 
Mfaft) of these orderings. The latter fact is obvious from the definition 
and to see the former, consider two such orderings; there is a permutation 
taking one to the other. Given a realization of Q°^ d * associated with the 
realization of 7-C§ l \ applying the same permutation gives a different real- 
ization of Q°^ d * associated with the same realization of TipP -1 . These two 
realizations of Q°^j d * have the same probability, and map to the same ele- 
ment of Q* N , and (here we are using that the second part of the labels are 
all distinct) this is the only way that different realizations of Q°^j d * can map 
to the same element of Q* N . m 
So it remains only to prove 

Proposition 9 

ElogM(^ n/ ) ~ iVlogiV. 

Proof. Choose Kjy ~ N e for small e > and partition the labels [1, N] into 
K n consecutive intervals I±, I2, ■ ■ ■ each containing N/K^ labels. Consider a 
realization of the (labeled) Erdds-Renyi graph %n- The number Vi of edges 
with both end-vertices in I{ has Binomial^^,^) , a/N) distribution with 
mean ~ -^r, and from standard large deviation bounds (e.g. [8] Theorem 
2.15) 

P(Vi < w, all 1 < i < K N ) 1. 
For a realization % m satisfying these inequalities we have 



M{H U ^ 1 ) > 



(JL 



2aN 



k n 



ition 

each interval Ii, first placing the (at most %t-) labels involved in the edges 



This holds because we can create permutations consistent with TVff 1 by, on 



2(3 



with both ends in Ij in increasing order, then placing the remaining labels 
in arbitrary order. So 

, ( ( N 2aN\ x K " 

ElogM(H^) > (l-o(l))log^— - -j^-y 



(1 - e)NlogN 



establishing Proposition 9. ■ 

Remark. Proposition 9 and Lemma 8 are in the spirit of the graph en- 
tropy literature, but we could not find these results there. As discussed in 
Section 2.2, this literature is largely concerned with the complexity of the 
structure of an unlabeled graph, or in the case of [7], the entropy of prob- 
ability distributions on unlabeled graphs. A quantity of interest in these 
settings is the "automorphism group" of the graph which is closely related 
to MiT-Cjy 1 ) here. For example, an analog of (40) is shown in Lemma 1 
of [7] and Theorem 1 there uses this lemma to relate the entropy rate be- 
tween an Erdos-Renyi graph on N vertices with edge probabilities pn with 
distinguished vertices and that of the same model where the vertex labels 
are ignored. Their result is very close to Proposition 9, but [7] only consid- 
ers edge weights pn satisfying Np^ / log(iV) bounded away from zero, which 
falls outside our setting. 



6 Open problems 

Aside from the (quite easy) Lemmas 1 and 2, our results concern specific 
models. Are there interesting "general" results in this topic? Here are two 
possible avenues for exploration. 

Given a random graph- with vertex-names Q = Gn, there is an associated 
random unlabeled graph £? unl and an associated random unordered set of 
names Names, and obviously 

ent(£) > max(ent(CT n '),ent(Names)). 

Lemmas 1 and 2, applied conditionally as indicated in Section 4.6, give suffi- 
cient conditions for ent(Q) = ent(Q unl ) or ent(Q) = ent(Names). In general 
one could ask "given Q unl and Names, how random is the assignment of 
names to vertices?" The standard notion of graph entropy enters here, as a 
statistic of the "completely random" assignment, so the appropriate condi- 
tional graph entropy within a model constitutes a measure of relative ran- 
domness. Another question concerns measures of strength of association of 
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names across edges. One could just take the space A N of possible names, 
consider the empirical distribution across edges (v, w) of the pair of names 
(sl(v), sl(w)) as a distribution on the product space A. Ln x A Ln and compare 
with the product measure using some quantitative measure of dependence. 
But neither of these procedures quite gets to grips with the issue of find- 
ing conceptually interpretable quantitative measures of dependence between 
graph structure and name structure, which we propose as an open problem. 

A second issue concerns "local" upper bounds for the entropy rate. In 
the classical context of sequences X\, . . . , X n from A, an elementary conse- 
quence of subadditivity is that (without any further assumptions) one can 
upper bound ent(Xi, . . . ,X n ) in terms of the "size-fc random window" en- 
tropy 

S n>k := ent(Xu, X u+1 , X u+k _i); U uniform on [l,n - k + 1] 

and this is optimal in the sense that for a stationary ergodic sequence the 
"global" entropy rate is actually equal to the quantity 

lim lim k~ 1 £ n k 

k— >oo n—t-co 

arising from this "local" upper bound. In our setting we would like some 
analogous result saying that, for the entropy £n,u of the restriction of Qn 
to some "size-A;" neighborhood of a random vertex, there is always an upper 
bound for the entropy rate c of the form 

& N k 

c < lim lim — — '— - 

fc->oo N-¥co K log N 

and that this is an equality under some "no long-range dependence" condi- 
tion analogous to ergodicity. But results of this kind seem hard to formulate, 
because of the difficulty in specifying which vertices and edges are to be in- 
cluded in the "size-fc" neighborhood. 

Acknowledgement. The hybrid model arose from a conversation with 
Sukhada Fadnavis. 

References 

[1] David L. Alderson and John C. Doyle. Contrasting views of complex- 
ity and their implications for network-centric infrastructures. Systems, 
Man and Cybernetics, 40:839 - 852, 2010. 



28 



[2] David Aldous. The continuum random tree. I. Ann. Probab., 19(1) :1- 
28, 1991. 



[3] David Aldous and Russell Lyons. Processes on unimodular random 
networks. Electron. J. Probab., 12:no. 54, 1454-1508, 2007. 

[4] David Aldous and J. Michael Steele. The objective method: prob- 
abilistic combinatorial optimization and local weak convergence. In 
Probability on discrete structures, volume 110 of Encyclopaedia Math. 
Sci., pages 1-72. Springer, Berlin, 2004. 

[5] Paolo Boldi and Sebastiano Vigna. The webgraph framework I: Com- 
pression techniques. In Proc. of the Thirteenth International World 
Wide Web Conference, pages 595-601. ACM Press, 2003. 

[6] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzen- 
macher, Alessandro Panconesi, and Prabhakar Raghavan. On com- 
pressing social networks. In Proceedings of the 15th ACM SIGKDD in- 
ternational conference on Knowledge discovery and data mining, KDD 
'09, pages 219-228, New York, NY, USA, 2009. ACM. 

[7] Yongwook Choi and Wojciech Szpankowski. Compression of graphical 
structures: Fundamental limits, algorithms, and experiments. IEEE 
Trans. Information Theory, 58:620-638, 2012. 

[8] Fan Chung and Linyuan Lu. Complex graphs and networks, volume 
107 of CBMS Regional Conference Series in Mathematics. Published 
for the Conference Board of the Mathematical Sciences, Washington, 
DC, 2006. 

[9] Thomas M. Cover and Joy A. Thomas. Elements of information theory. 
Wiley-Interscience [John Wiley &; Sons], Hoboken, NJ, second edition, 
2006. 

[10] Matthias Dehmer and Abbe Mowshowitz. A history of graph entropy 
measures. Inform. Sci., 181(l):57-78, 2011. 

[11] Ross J. Kang and Colin McDiarmid. The i-improper chromatic number 
of random graphs. Combin. Probab. Comput., 19(l):87-98, 2010. 

[12] Ioannis Kontoyiannis. Pattern matching and lossy data compression on 
random fields. IEEE Trans. Inform. Theory, 49(4):1047-1051, 2003. 



29 



[13] Alon Orlitsky, Narayana P. Santhanam, and Junan Zhang. Universal 
compression of memory less sources over unknown alphabets. IEEE 
Trans. Inform. Theory, 50(7):1469-1481, 2004. 

[14] J. van den Berg and Harry Kesten. Inequalities with applications to 
percolation and reliability. J. Appl. Probab., 22(3):556-569, 1985. 

[15] Aaron B. Wagner, Pramod Viswanath, and Sanjeev R. Kulkarni. Prob- 
ability estimation in the rare-events regime. IEEE Trans. Inform. The- 
ory, 57(6):3207-3229, 2011. 



7 Appendix 

Small worlds model: < 7 < 2. Here we complete the analysis of the 
graph entropy rate in the "small worlds" model of Section 4.3. First we 
show that for a as in (12), the average degree tends to a constant. For 
D u = (n— l)/2 + b and Di = (n — l)/2 — b, where b is constant with respect 
to N (and chosen large enough for the inequalities below to hold), we find 
using (9) that 

-tt /A pD t sec(0) 



/■7T/4 rUl SCC^tfJ 

8a J J r-^ +1 drd9 < ED(v) - 4 



f-n/4 rD u sec(0) 

< 8a / r~ 1+x drd9 + M 



Ji 



2" 



> - 



^rj[ sec 2 -^e)de-^ < ed(v)-a (4i) 

< [ h ' sec2 " 7 (^-|^ + ^7V- 

Taking a and k 7 as in (12), the inequalities above imply ED(v) — > 4 + a. 

To show the entropy rate is as claimed, take N large enough to make 
a < 1/2, so that <5(ar~ 7 ) is a decreasing function of r for r > 1. Using the 
inequality — (1 — x) log(l — x) < x for < x < 1, 

-8a p/ 4 r D ' scc ( e ) 

£(ar~' y )rdrd0 



log(JV) 



-Rn f w / 4 f- D ;scc(6») 

— - / / r^ +1 log(ar-')drd6, (42) 

log(JV) Jo Jx 

and we will show (42) tends to a as N — > 00. From this point, similar 
arguments show the same is true with Di replaced by D u , so that following 
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the arguments that established the convergence of the average degree and 
using (10), we find 

ent(Q N ) a 
lim = — 

A^ooiVlog(A0 2' 

as desired. To obtain the claimed asymptotic, note that we can write (42) 
as 

/■tt/4 pDi scc(9) 

-8alog(o) J J r~^ +1 drd9 (43) 

/>7r/4 />D i scc(6») 

+ 807 / / r" 7+1 \og{r)drd6. (44) 



From (41) above and the definition (12) of a, it is easy to see that (43) is 

a{l - 7/2) log(iV) + o(log(iV)) as N -> oo. (45) 
Now, making the substitution u = r 2-7 , (44) is equal to 
807 

?Jo J i 

fTr/4 



(2-7) 2 



tt/4 ,-D 2 - 1 sec 2 -~< (9) 

/ \og{u)dud6 



807 



r -D 2 ~ 7 sec 2 - 7 (0)[log( J D 2 - 7 sec 2 " 7 ^)) - l]d0. 

JO 



(2 - 7 ) 2 

After simplification, the only term that is not o(log(iV)) is 

8«7 n 2- 7 



Jo 



(2- 7 ) 2 

which after simplification is equal to 

^log(A0 + o(log(A0). (46) 

Combining (45) and (46) with (43) and (44) implies (42) tends to a as 
N -> oo. 
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